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Abstract 

This  paper  describes  the  participation  of 
UCD  IIRG  in  the  TREC  2012  Medical 
Records  track,  which  fosters  research  in  the 
retrieval  of  electronic  health  records  using 
free  text  fields.  Our  contributions  to  this 
track  investigate  several  problem  areas  in  the 
retrieval  of  medical  documents.  Multiple 
knowledge  sources  are  investigated  to  allevi¬ 
ate  the  issue  of  vocabulary  mismatch.  Med¬ 
ical  records  arc  verbose  documents  that  give 
a  full  picture  of  a  patient’s  medical  status  in¬ 
cluding  their  family  health  information  and 
their  own  medical  history.  A  Condition  At¬ 
tribution  and  Temporal  Grounding  system  is 
implemented  to  address  such  occurrences.  A 
rule-based  system  is  employed  in  order  to 
extract  the  patient’s  demographic  informa¬ 
tion  from  their  medical  record.  All  extracted 
information  is  then  leveraged  using  Indri's 
structured  query  language.  These  methods 
are  combined  to  identify  patients  who  fit 
the  exact  criteria  as  described  in  natural  lan¬ 
guage  queries. 

1  Introduction 

The  TREC  2012  Medical  Records  Rack  fosters  re¬ 
search  in  the  retrieval  of  electronic  health  records 
using  free  text  fields.  These  free  text  fields  outline 
a  set  of  inclusion  criteria  that  specify  a  patient  co¬ 
hort,  and  must  be  satisfied  in  order  for  a  record  to  be 
relevant.  These  criteria  include  demographic  infor¬ 
mation  (for  example  a  patient’s  age,  gender  and  eth¬ 


nicity)  as  well  as  their  medical  conditions  or  treat¬ 
ments. 

The  comprehensive  nature  of  medical  records 
pose  numerous  problems  in  accurately  identifying 
patients  who  fit  a  very  specific  set  of  criteria.  Firstly, 
there  is  the  issue  of  vocabulary  mismatch.  For  ex¬ 
ample  “hypertension”  may  be  noted  as  “high  blood 
pressure”,  “HNT”  or  “HBT”.  To  address  this  we  use 
several  knowledge  sources  and  tools  to  identify  and 
expand  key  medical  concepts  provided  in  queries. 
Medical  records  often  describe  a  patient’s  medical 
history  including  that  of  relatives.  Such  issues  are 
resolved  with  the  use  of  a  Condition  Attribution  and 
Temporal  Grounding  system  that  determines  if  a  pa¬ 
tient  experienced  a  condition  and  if  it  is  relevant  to 
their  current  state.  In  order  to  identify  highly  spe¬ 
cific  information  in  verbose  documents  we  aim  to 
add  structure  to  documents  using  machine  learning 
and  rule-based  methods  that  is  then  leveraged  using 
Indri’s  structured  query  language. 

This  paper  is  structured  as  follows.  Section  3  pro¬ 
vides  an  overview  of  the  task  as  well  as  the  Indri 
retrieval  model.  Section  3  details  the  background 
technologies  and  motivation  behind  the  systems  de¬ 
veloped  at  IIRG.  Section  4  describes  the  submitted 
runs  and  the  results  of  their  evaluation.  Discussion 
and  conclusions  are  put  forward  in  Section  5  and 
Section  6,  respectively. 

2  Search  Task 

The  goal  of  the  Medical  Records  track  is  to  foster 
research  on  providing  content-based  access  to  the 
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free-text  fields  of  electronic  medical  records1.  The 
2012  task  builds  on  the  previous  year  with  the  aim 
of  identifying  cohorts  of  patients  who  tit  a  set  of  cri¬ 
teria  that  describe  their  medical  status  as  well  as  key 
demographic  information,  known  as  inclusion  crite¬ 
ria.  The  queries  for  this  task  take  the  form  of  natu¬ 
ral  language  text  specifying  a  set  of  inclusion  crite¬ 
ria,  e.g.  Children  with  dental  caries.  The  test  corpus 
comprises  101,700  de-identified  medical  records  ob¬ 
tained  from  the  University  of  Pittsburgh  NLP  repos¬ 
itory2.  The  unit  of  retrieval  for  this  task  is  a  visit, 
which  is  composed  of  multiple  records  that  relate 
to  the  same  medical  episode.  A  mapping  report  is 
provided  for  the  task  in  order  to  create  these  17,265 
visits. 

2.1  Indri  Structured  Query  Language 

The  systems  described  in  this  paper  all  use  the  Indri3 
IR  engine,  that  combines  inference  model  and  lan¬ 
guage  model  approaches(Metzler  and  Croft,  2004). 
This  model  was  chosen  as  the  basis  for  the  sys¬ 
tems  described  in  this  paper  because  of  its  ability 
to  handle  phrases  as  well  as  its  robust  structured 
query  language,  one  aspect  of  this  query  language 
is  the  ability  for  field  restriction.  Field  restriction 
limits  the  matching  of  an  expression  to  a  particu¬ 
lar  field  found  within  indexed  documents.  For  ex¬ 
ample,  the  query  “shakespeare. author”  would  en¬ 
sure  that  documents  matching  shakespeare  in  the  au¬ 
thor  field  arc  returned.  On  the  corpus  side,  field 
extents  arc  identified  using  XMLlike  markup,  e.g. 
<author>shakespeare</author>. 

3  System  Background  &  Motivation 

This  section  outlines  the  technologies  and  motiva¬ 
tion  behind  UCD  IIRG’s  submissions  to  the  TREC 
2012  Medical  Track. 

3.1  Demographic  Information 

A  key  aspect  in  identifying  patient  cohorts  is  the  res¬ 
olution  of  demographic  information.  Demographic 
information  includes  attributes  such  as  age  and  gen¬ 
der  as  well  as  ethnicity.  Age  group  information  is 

'http : //tree . ni st . gov/ act_part /tracks 12 . 
html 

2http : / /www . dbmi . pitt . edu/ nip/ 
report- repository 

3 http : / /sou reef orge . net /pro  jects /lemur / 


identified  in  the  Pittsburgh  dataset  through  their  own 
de-identification  process.  Gender  and  ethnicity  is 
extracted  using  a  set  of  regular  expression  rules. 

3.2  Condition  Identification,  Condition 
Attribution  &  Temporal  Grounding 

In  order  to  identify  medical  conditions  in  text,  an 
implementation  of  the  IndexFinder  algorithm  was 
used  (Zou  et  al.,  2003).  However  as  medical  records 
must  give  a  complete  description  of  a  patient’s 
healthcare  profile,  they  arc  necessarily  word-heavy. 
This  includes  a  description  of  familial  medical  con¬ 
ditions  (e.g.  father’s  diabetes)  as  well  as  past  medi¬ 
cal  history  (e.g.  past  admission  with  wrist  fracture). 
A  Condition  Attribution  and  Temporal  Grounding 
system  (Cogley  et  ah,  2012)  is  used  to  resolve  such 
occurrences. 

3.3  Indexing 

The  systems  described  in  Sections  3.2  and  3.1  were 
used  to  mark  field  extents  describing  age,  gender, 
ethnic  origin  as  well  as  information  regarding  past, 
present  and  familial  medical  conditions.  The  mark¬ 
up  tags  used  in  the  system  arc  outlined  in  Table  1 . 

Following  the  identification  of  required  informa¬ 
tion,  visit  documents  were  created.  The  visit  docu¬ 
ments  arc  created  using  a  simple  shell  script  to  con¬ 
catenate  a  visits  constituent  reports.  The  script  reads 
the  mapping  tile  provided  by  TREC,  which  mapped 
reports  to  visits  using  unique  identifiers.  Indexes 
were  then  built  from  these  visit  documents.  The  in¬ 
dex  was  left  unstemmed  in  order  to  avoid  problem 
instances  such  as  ‘AIDS’  stemmed  to  ‘AID’. 


Field  Tag 

Usage 

CONDITION 

PASTCOND 

FAMCOND 

AGE 

AGEGRP 

GEN 

ETH 

Identities  a  condition 

Identities  a  former  condition 

Identities  a  familial  condition 
Identities  an  age 
Identifies  an  age  group 
Identities  patient  gender 
Identifies  patient  ethnicity 

Table  1 :  Fields  marked  in  Text 


3.4  Structured  Retrieval 

Following  the  identification  of  fields  in  documents 
using  the  systems  described  in  previous  sections. 


this  information  may  be  manipulated  using  Indri’s 
structured  retrieval  and  field  restrictions.  As  a  re¬ 
sult  highly  specific  queries  (1)  may  be  translated  to 
structured  queries  (2). 

1.  Elderly  adults  with  a  past  admission  for  frac¬ 
tures 

2.  elderly.AGEGRP  adults  with  a  past  admission 
for  fractures.PASTCOND 

3.5  Query  Expansion 

Medical  literature  is  a  rich  source  of  synonymy  and 
in  an  IR  context  vocabulary  mismatch  is  an  of¬ 
ten  encountered  issue  (Limsopatham  et  ah,  2011). 
To  combat  these  problems,  query  expansion  is 
employed.  Both  manual  and  automatic  meth¬ 
ods  are  used,  as  detailed  below.  Concept  Re- 
Ranking  (Stokes  et  ah,  2007),  a  method  developed 
to  address  the  issues  arising  when  a  document  con¬ 
tains  multiple  references  to  the  same  concept  term, 
is  also  enforced. 

3.5.1  Manual 

The  query  is  submitted  to  PubMed4.  Pubmed  cre¬ 
ates  chunks  from  this  query  which  represent  entries 
in  MeSH  (Johnston  et  ah,  2002).  A  manual  sys¬ 
tematic  lookup  is  then  performed  on  these  entries  at 
http  :  / /www .  ncbi  .  nlm.  nih .  gov/mesh  i.e. 
All  synonyms  arc  then  taken  from  these  results  and 
added  to  the  original  query.  No  manual  filtering  of 
appropriate  terms  was  conducted. 

3.5.2  Automatic 

Concepts  arc  identified  using  the  MetaMap  tool. 
Expansions  arc  then  generated  using  one  or  more 
of  the  following:  the  MetaMap  tool;  queries  to 
PubMed;  the  knowledge  graph  FreeBase5.  The 
queries  are  then  automatically  generated  using  these 
expansions  and  a  rule  based  system  translates  the 
queries  to  Indri’s  structured  query  language. 

3.6  Submitted  Systems 

Four  systems  were  submitted  by  UCD  IIRG  to  this 
year’s  Medical  Track.  They  arc  as  follows. 

4http : / /www . ncbi . nlm. nih . gov/ pubmed 

5http : / /www . freebase . com/ 


•  ucDCSil  A  manual  run  using  MeSH  based  ex¬ 
pansions,  Indri's  structured  query  language  to 
specify  demographic  information  and  Concept 
Re-Ranking. 

•  ucdcsi2  A  manual  run  using  MeSH  based  ex¬ 
pansions,  Indri's  structured  query  language  to 
specify  demographic  information  such  as  ages 
and  Concept  Re-Ranking.  Furthermore  it  uses 
field-based  retrieval  in  order  to  utilise  more 
specific  information  regarding  medical  condi¬ 
tions  namely  determining  the  experiencer  and 
whether  or  not  it  occurred  in  the  past. 

•  ucdcsi3  An  automatic  run  using  MeSH  based 
expansions,  Indri’s  structured  query  language 
to  specify  demographic  information  such  as 
ages  and  Concept  Re-Ranking. 

•  ucdcsi4  An  automatic  run  using  MeSH  based 
expansions  and  Indri’s  structured  query  lan¬ 
guage  to  specify  demographic  information  such 
as  ages  without  Concept  Re-Ranking. 

4  Experimental  Results 

This  section  describes  the  performance  of  the  four 
runs  submitted  to  the  TREC  2012  Medical  Track  by 
UCD  IIRG.  The  submissions  consisted  of  two  man¬ 
ual  runs  UCDCSI1,  UCDCSI2  and  two  automatic 
runs  UCDCSI3  and  UCDCSI4.  Four  metrics,  infAP, 
infNCDG,  R-prec  and  Precision  @  10  (P  @  10)  arc 
used  to  evaluate  all  submissions.  88  runs  were  sub¬ 
mitted  in  total  to  the  track.  Of  the  88  submitted,6 
were  manual  with  the  remaining  82  automatic.  Ta¬ 
bles  2  and  3  display  the  results  of  UCD  IIRG’s  sub¬ 
missions  and  the  hypothetical  Max  and  Median  runs, 
respectively.  Table  4  shows  the  hypothetical  max, 
median  automated  runs  and  UCDCSI4  . 


ID 

infAP 

infNDCG 

R-prec 

P  @  10 

UCDCSI1 

0.168 

0.406 

0.280 

0.4915 

UCDCSI2 

0.121 

0.346 

0.237 

0.3851 

UCDCSI3 

0.089 

0.286 

0.195 

0.283 

UCDCSI4 

0.105 

0.319 

0.223 

0.340 

Table  2:  UCD  IIRG  Submissions 


UCDCSI1  had  the  best  performance  among  UCD 
IIRG  submissions  across  all  metrics.  The  man¬ 
ual  runs  significantly  outperformed  their  automatic 


counterparts,  owing  mainly  to  the  much  more  so¬ 
phisticated  manner  of  query  structuring  that  uses 
weighting  among  concepts. 


ID 

infAP 

infNDCG 

R-prec 

P  @  10 

MAX 

0.395 

0.722 

0.515 

0.802 

MEDIAN 

0.200 

0.464 

0.326 

0.551 

UCDCSI1 

0.168 

0.406 

0.280 

0.492 

Table  3:  Manual  Max,  Automatic  Median  &  UCD- 
CSI1 


The  top-performing  IIRG  submission  achieves 
moderate  performance,  with  no  significant  differ¬ 
ence  between  it  and  the  hypothetical  median  run. 


ID 

infAP 

infNDCG 

R-prec 

P  @  10 

MAX 

0.423 

0.746 

0.543 

0.815 

MEDIAN 

0.170 

0.424 

0.294 

0.470 

UCDCSI4 

0.105 

0.319 

0.223 

0.340 

Table  4:  Automatic  Max,  Automatic  Median  & 
UCDCSI4 


Although  the  automated  submission  UCDCSI4 
performs  moderately  well,  it  does  so  using  very 
basic  rules  in  order  to  translate  natural  language 
queries  into  Indri's  complex  query  language  that  re¬ 
quires  the  knowledge  of  domain  experts.  The  rami¬ 
fications  of  this  are  discussed  in  the  next  section. 

5  Discussion 

In  this  section,  we  present  a  discussion  of  the  re¬ 
sults  of  UCD  IIRG’s  submissions  to  TREC  Medical 
Track  2012,  both  in  respect  to  one  another  and  the 
hypothetical  max  and  median  runs. 

Figure  1  displays  the  per  topic  inf  A  P  score  of  the 
maximum  and  median  participant  results  along  with 
the  authors’  top  performing  submission,  UCDCSI1. 

UCDCSI1  matches  the  best  performing  runs  on 
six  occasions  (136,  138,  143,  158,  178,  184).  Of 
these  six  queries,  the  median  also  matched  on  per- 
formace  (143,  178).  Both  of  these  queries  arc  rela¬ 
tively  simple  e.g.  Patients  who  have  had  a  carotid 
endarterectomy,  which  explains  their  high  perfor¬ 
mance.  UCDCSI1  outperforms  the  median  runs  on 
queries  with  age  group  information.  Children  with 
dental  caries  and  secondary  information.  Patients 
with  esophageal  cancer  who  develop  pericardial  ef¬ 
fusion. 


In  total,  UCDCSI1  equals  the  score  of  the  me¬ 
dian  run  on  16  topics  (137,  143,  145,  147,  148, 
149,  152,  155,  156,  165,  168,  174,  177,  179,  182, 
185).  These  queries  ranged  from  quite  simple  crite¬ 
ria  such  as  Patients  with  Ischemic  Vascular  Disease 
that  achieved  high  scores  to  very  complex  queries 
such  as  topic  141,  Adult  inpatients  with  Alzheimer’s 
disease  admitted  from  nursing  homes  with  pressure 
ulcers  or  queries  that  had  limited  expansions  such  as 
Patients  with  inflammatory  disorders  receiving  TNF- 
inhibitor  treatments  that  produced  very  low  scores. 

There  is  a  significant  difference  in  the  per¬ 
formances  of  UCDCSI1  and  UCDCSI2  with 
UCDCSI1  achieving  consistently  higher  scores  as 
shown  in  Table  4.  These  results  indicate  that  the 
introduction  of  further  information  relating  to  con¬ 
ditions  in  the  query  may  be  too  strict. 

The  automatic  submissions  UCDCSI3  and 
UCDCSI4  are  significantly  outperformed  by  the 
max  and  median  runs.  An  important  point  of  note 
is  the  increase  in  performance  of  UCDCSI4  over 
UCDCSI3  on  the  removal  of  concept  Re-Ranking. 
There  are  two  causes  for  this.  First,  as  they  are  au¬ 
tomatic  runs  they  arc  susceptible  to  the  inaccuracies 
of  the  concept  identification  tool.  Secondly,  only 
medical  conditions  were  identified  in  the  queries, 
giving  them  a  much  higher  weighting  than  equally 
important  concepts  such  as  treatments,  medications 
and  demographic  information. 

6  Conclusion 

As  paid  of  TREC  Medical  track  2012,  we  investi¬ 
gated  several  problem  areas  in  the  retrieval  of  med¬ 
ical  documents.  These  submissions  investigated  the 
areas  of  query  expansion,  automated  query  trans¬ 
lation  from  natural  language  to  Indri’s  structured 
query  language  and  the  use  of  specific  demographic 
information  about  patients  including  their  age,  gen¬ 
der  and  information  relating  to  their  medical  sta¬ 
tus.  Four  submissions  were  made  in  investigation 
of  these  problem  areas.  Two  runs  were  manually 
created  with  the  others  using  automatically  gener¬ 
ated  queries.  The  manual  query  UCDCSI 1  matched, 
on  average  the  median  run.  However,  this  approach 
encountered  difficulty  with  queries  that  had  few  ex¬ 
pansions  as  well  as  verbose  queries.  The  use  of 
more  specific  information  such  as  the  attribution  and 


Topic 


Figure  1:  infAP  Max,  Median  and  UC  DCS  I  run  I  scores  per  topic 


temporal  grounding  of  medical  conditions  saw  a  de¬ 
crease  in  performance.  The  automatic  runs  were  out¬ 
performed  by  the  max  and  median  runs,  highlighting 
difficulties  in  concept  identification  as  well  as  the 
automated  translation  of  natural  language  queries  to 
a  structured  format. 

This  leads  to  future  work  in  performing  more  ac¬ 
curate  concept  identification  as  well  as  further  refin¬ 
ing  the  translation  of  natural  language  queries.  Fur¬ 
thermore  the  expansion  resources  used  for  this  task 
proved  to  be  limited,  thus  failing  to  resolve  fully  the 
effects  of  vocabulary  mismatch. 
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